Data-driven Planning via Imitation Learning

نویسندگان

Sanjiban Choudhury

Mohak Bhardwaj

Sankalp Arora

Ashish Kapoor

Gireeja Ranade

Sebastian Scherer

Debadeepta Dey

چکیده

Robot planning is the process of selecting a sequence of actions that optimize for a task specific objective. For instance, the objective for a navigation task would be to find collision free paths, while the objective for an exploration task would be to map unknown areas. The optimal solutions to such tasks are heavily influenced by the implicit structure in the environment, i.e. the configuration of objects in the world. State-of-the-art planning approaches, however, do not exploit this structure, thereby expending valuable effort searching the action space instead of focusing on potentially good actions. In this paper, we address the problem of enabling planners to adapt their search strategies by inferring such good actions in an efficient manner using only the information uncovered by the search up until that time. We formulate this as a problem of sequential decision making under uncertainty where at a given iteration a planning policy must map the state of the search to a planning action. Unfortunately, the training process for such partial information based policies is slow to converge and susceptible to poor local minima. Our key insight is that if we could fully observe the underlying world map, we would easily be able to disambiguate between good and bad actions. We hence present a novel datadriven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle an oracle that at train time has full knowledge about the world map and can compute optimal decisions. We leverage the fact that for planning problems, such oracles can be efficiently computed and derive performance guarantees for the learnt policy. We examine two important domains that rely on partial information based policies informative path planning and search based motion planning. We validate the approach on a spectrum of environments for both problem domains, including experiments on a real UAV, and show that the learnt policy consistently outperforms stateof-the-art algorithms. Our framework is able to train policies that achieve upto 39% more reward than state-of-the art information gathering heuristics and a 70x speedup as compared to A* on search based planning problems. Our approach paves the way forward for applying data-driven techniques to other such problem domains under the umbrella of robot planning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Search via Self-Imitation

We study the problem of learning a good search policy. To do so, we propose the self-imitation learning setting, which builds upon imitation learning in two ways. First, self-imitation uses feedback provided by retrospective analysis of demonstrated search traces. Second, the policy can learn from its own decisions and mistakes without requiring repeated feedback from an external expert. Combin...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the nearoptimal cost-to-go oracle on the planning horizon and demonstrate that the costto-go oracle shortens the learner’s planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading t...

متن کامل

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

متن کامل

Driving Like a Human: Imitation Learning for Path Planning using Convolutional Neural Networks

Human-like path planning is still a challenging task for automated vehicles. Imitation learning can teach these vehicles to learn planning from human demonstration. In this work, we propose to formulate the planning stage as a convolutional neural network (CNN). Thus, we can employ well established CNN techniques to learn planning from imitation. With the proposed method, we train a network for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1711.06391 شماره

صفحات -

تاریخ انتشار 2017

Data-driven Planning via Imitation Learning

نویسندگان

چکیده

منابع مشابه

Learning to Search via Self-Imitation

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

Driving Like a Human: Imitation Learning for Path Planning using Convolutional Neural Networks

عنوان ژورنال:

اشتراک گذاری